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Introduction 


Take note of what James Moorer said: 

This is compute power far beyond what even the most 
starry-eyed fortune-teller could have imagined! It will 
change the very nature of what audio is, and what 
audio engineers do, since it changes what is possible 
at a fundamental level. 
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HiPAC 


We continue to propose HiPAC — High-Performance Audio 
Computing — as an important new domain of study that 
explores the potential for new advanced processor 
architectures to transform the current landscape of audio 
synthesis, processing and music composition. 
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HiPAC 


We continue to propose HiPAC — High-Performance Audio 
Computing — as an important new domain of study that 
explores the potential for new advanced processor 
architectures to transform the current landscape of audio 
synthesis, processing and music composition. 

A key aspect of HiPAC from the point of view of the computer 
musician is that within a relatively short timescale, we can 
expect to see these technologies in consumer-grade hardware. 
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The Technology - History 


We used to have “supercomputing” - Crayl, Connection 
Machine, MasPar. We can distinguish 
Single-lnstruction-Multiple-Data (SIMD) machines and 
Multiple-lnstruction-Multiple-Data (MIMD). 
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We used to have “supercomputing” - Crayl, Connection 
Machine, MasPar. We can distinguish 
Single-lnstruction-Multiple-Data (SIMD) machines and 
Multiple-lnstruction-Multiple-Data (MIMD). 

In the 1980s and 1990s one can find many experiments and 
suggestions. These were made obsolete by the brute power of 
faster machines. 
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The Technology - History 


We used to have “supercomputing” - Crayl, Connection 
Machine, MasPar. We can distinguish 
Single-lnstruction-Multiple-Data (SIMD) machines and 
Multiple-lnstruction-Multiple-Data (MIMD). 

In the 1980s and 1990s one can find many experiments and 
suggestions. These were made obsolete by the brute power of 
faster machines. 

Now the speed increases are ending, and we need to return to 
being clever. 
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Modern SIMD fine-grained parallel architectures are associated 
today firstly with the vector extensions to standard CPUs. 
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Modern SIMD 


Modern SIMD fine-grained parallel architectures are associated 
today firstly with the vector extensions to standard CPUs. 

And with the graphics accelerator cards now essential to all 
consumer workstations (especially where high performance in 
games is required). 
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From Supercomputers to the Desktop 


As it gets harder to increase the speed of a processor the 
manufacturers are providing multiple CPU cores on a chip. 
Normally 2 or 4 cores, 8 promised soon and. 
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As it gets harder to increase the speed of a processor the 
manufacturers are providing multiple CPU cores on a chip. 
Normally 2 or 4 cores, 8 promised soon and the Intel Polaris 
with 80, with claim performance up to 1 Teraflop, but you need 
to wait about 10 years 
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From Supercomputers to the Desktop 


As it gets harder to increase the speed of a processor the 
manufacturers are providing multiple CPU cores on a chip. 
Normally 2 or 4 cores, 8 promised soon and the Intel Polaris 
with 80, with claim performance up to 1 Teraflop, but you need 
to wait about 10 years 

Also SlMD-style accelerator systems designed to work in 
conjunction with a host computer. We have experience of 
Clearspeed. 


□ ► < S 1 ► 


iBATH 


1 -o QvO 


ffitch, Dobson, Bradford 


HiPAC 



Background 

Hardware 

Software 

Example 

Conclusions 



<g ► < | ► < t 


iBATH 


1 'O'KO 


ffitch, Dobson, Bradford 


HiPAC 








Background 

Hardware 

Software 

Example 

Conclusions 

From Supercomputers to the Desktop 


As it gets harder to increase the speed of a processor the 
manufacturers are providing multiple CPU cores on a chip. 
Normally 2 or 4 cores, 8 promised soon and the Intel Polaris 
with 80, with claim performance up to 1 Teraflop, but you need 
to wait about 10 years 

Also SlMD-style accelerator systems designed to work in 
conjunction with a host computer. We have experience of 
Clearspeed. 
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From Supercomputers to the Desktop 


As it gets harder to increase the speed of a processor the 
manufacturers are providing multiple CPU cores on a chip. 
Normally 2 or 4 cores, 8 promised soon and the Intel Polaris 
with 80, with claim performance up to 1 Teraflop, but you need 
to wait about 10 years 

Also SlMD-style accelerator systems designed to work in 
conjunction with a host computer. We have experience of 
Clearspeed. 

And the so-called GPGPU. 
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Earlier Attempts 


In Audio: 
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In Audio: 

9 Transputer Array in Durham: 170 processors 
9 Midas in York: dataflow through SGI net 
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Earlier Attempts 


In Audio: 

9 Transputer Array in Durham: 170 processors 
9 Midas in York: dataflow through SGI net 

In Computer Science: 
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Earlier Attempts 


In Audio: 

9 Transputer Array in Durham: 170 processors 
9 Midas in York: dataflow through SGI net 

In Computer Science: 

9 Bath Parallel LISP Machine 
9 Timewarp 
9 ...etc 
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Software for Vector processing 


We have been working with Clearspeed pic to use their 
accelerator in interesting ways. 
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We have been working with Clearspeed pic to use their 
accelerator in interesting ways. 

They have a version of C extended with poly and mono 
variables. (Similar to MasPar system) 
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Software for Vector processing 


We have been working with Clearspeed pic to use their 
accelerator in interesting ways. 

They have a version of C extended with poly and mono 
variables. (Similar to MasPar system) 

and there are others. 


□ ► < S 1 ► 


ffitch, Dobson, Bradford 


HiPAC 




Background 

Hardware 

Software 

Example 

Conclusions 

Software for Vector processing 


We have been working with Clearspeed pic to use their 
accelerator in interesting ways. 

They have a version of C extended with poly and mono 
variables. (Similar to MasPar system) 

and there are others. 

But do not forget Amdahl’s law. 
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Amdahl’s Law 


The relationship between sequential and parallel computation 
is summarised in Amdahl’s law, which is stated as: 

1 /(S+P/N) 

where S is the fraction of serial computation, P = 1 - S is the 
amount of parallelisable computation and N is number of 
processors. 
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HiPAC Processors 


The primary defining characteristics of a HiPAC dsp process: 

a use of highly parallel fine-grained architectures ( e.g. 
following the SIMD model), though we do not exclude more 
“conventional” multi-core computation 

o real-time performance or better 

9 implies low latency 

9 ideal and “no-compromise” forms of algorithms 

9 new processes, and hence new effects and sounds, not 
simply “more of the same” - whether more reverbs or more 
voices. 
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A HiPAC case study - the Parallel Execution of Csound 


We have proposed the The Sliding Phase Vocoder (SPV) as a 
canonical example of a HiPAC process. 

But now I present here a new (and different) application: 
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We have proposed the The Sliding Phase Vocoder (SPV) as a 
canonical example of a HiPAC process. 

But now I present here a new (and different) application: 

MultiCore Csound 
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A HiPAC case study - the Parallel Execution of Csound 


We have proposed the The Sliding Phase Vocoder (SPV) as a 
canonical example of a HiPAC process. 

But now I present here a new (and different) application: 

MultiCore Csound 

The challenge is to make sensible use of a multicore processor, 
to provide more processing in real-time 
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A HiPAC case study - the Parallel Execution of Csound 


This is work in progress, and not in the written paper. A weak 
form may appear at ICMC2009. 

We can compute different instances of instruments in parallel, 


□ ► < S 1 ► 


iBATH 


1 -o QvO 


ffitch, Dobson, Bradford 


HiPAC 



Background 

Hardware 

Software 

Example 

Conclusions 

A HiPAC case study - the Parallel Execution of Csound 


This is work in progress, and not in the written paper. A weak 
form may appear at ICMC2009. 

We can compute different instances of instruments in parallel, 
as long as they do not use the same resource and 
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A HiPAC case study - the Parallel Execution of Csound 


This is work in progress, and not in the written paper. A weak 
form may appear at ICMC2009. 

We can compute different instances of instruments in parallel, 
as long as they do not use the same resource and 
as long as it is worth it. 


□ ► < S 1 ► 


iBATH 


1 -o QvO 


ffitch, Dobson, Bradford 


HiPAC 



Background 

Hardware 

Software 

Example 

Conclusions 

A HiPAC case study - the Parallel Execution of Csound 


Is it Worthwhile? 

We are creating a database of instruction counts for each 
opcode, parameterised by initialisation, instructions/sample and 
instructions/control. 

There is no point in parallel execution if the overhead of threads 
is comparable with the cost. 
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A HiPAC case study - the Parallel Execution of Csound 


Opcode 

init 

Audio 

Control 

table.a 

93 

23.063 

43.998 

table, k 

93 

0 

45 

butterlp 

9 

29.005 

5.478 

butterhi 

19 

30.000 

35 

butterbp 

20 

30 

71 

oscil.ka 

69 

17 

46 

bilbar 

371.5 

1856.028 

86 

ags 

497 

917.921 

79475.155 
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A HiPAC case study - the Parallel Execution of Csound 


In what ways can brute distribution be wrong? 
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A HiPAC case study - the Parallel Execution of Csound 


In what ways can brute distribution be wrong? 
« Clashing use of global variable 
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A HiPAC case study - the Parallel Execution of Csound 


In what ways can brute distribution be wrong? 

« Clashing use of global variable 
« Adding into the output bus 
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A HiPAC case study - the Parallel Execution of Csound 


In what ways can brute distribution be wrong? 

« Clashing use of global variable 
« Adding into the output bus 
« Other global structures 
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A HiPAC case study - the Parallel Execution of Csound 


Compiler technology allows us to identify the points of 
contention. 
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A HiPAC case study - the Parallel Execution of Csound 


Compiler technology allows us to identify the points of 
contention. 

Use of spin locks for output bus and others 
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A HiPAC case study - the Parallel Execution of Csound 


Compiler technology allows us to identify the points of 
contention. 

Use of spin locks for output bus and others 

Each k-cycle looks at a DAG of dependency, and schedules 
opcodes to maintain semantics. 
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A HiPAC case study - the Parallel Execution of Csound 


Consider the fragment of Csound: 


instr 1 


al oscil 

p4, p5, 

1 

out 

al 


endin 



instr 2 



gk oscil 

p4, p5, 

1 

endin 



instr 3 



al oscil 

gk, p5, 

1 

out 

al 



endin 


□ ► < S 1 ► 


111 BATH 

1 


ffitch, Dobson, Bradford 


HiPAC 



Background 

Hardware 

Software 

Example 

Conclusions 

A HiPAC case study - the Parallel Execution of Csound 
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A HiPAC case study - the Parallel Execution of Csound 


Current state is that it runs, not totally integrated. The cost 
database not used yet. 
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A HiPAC case study - the Parallel Execution of Csound 


Some performance figures running on a dual-core machine: 
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A HiPAC case study - the Parallel Execution of Csound 


Some performance figures running on a dual-core machine: 
Best so far is Electric Priest that goes from 53s to 34s 
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A HiPAC case study - the Parallel Execution of Csound 


Some performance figures running on a dual-core machine: 
Best so far is Electric Priest that goes from 53s to 34s 
Xanadu is less good at only 10% gain. 
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A HiPAC case study - the Parallel Execution of Csound 


Some performance figures running on a dual-core machine: 
Best so far is Electric Priest that goes from 53s to 34s 
Xanadu is less good at only 10% gain. 

Trapped in Convert only shows 16% gain. 
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A HiPAC case study - the Parallel Execution of Csound 


Some performance figures running on a dual-core machine: 
Best so far is Electric Priest that goes from 53s to 34s 
Xanadu is less good at only 10% gain. 

Trapped in Convert only shows 16% gain. 

But they are gains! 
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Conclusions 


Parallel processing is coming apace, and if we do not embrace 
it in all its forms we will again be the poor relations. 
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Conclusions 


Parallel processing is coming apace, and if we do not embrace 
it in all its forms we will again be the poor relations. 

This hardware will soon be cheap enough. The “techies” must 
grasp it. 


Thanks to Codemist Ltd and Ctearspeed pic. This work in unsupported by public 
agencies 

The parallel Csound is the work of Chris Wilson for his BSc in Computer Science under 
supervision of John flitch 
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